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Abstract. We study the problem of non-asymptotic deviations between a 
reference measure /i and its empirical version L„ , in the 1- Wasserstein metric, 
under the standing assumption that /i satisfies a transport-entropy inequahty. 
We extend some results of F. BoIIey, A. Guillin and C. Villani [7] with simple 
proofs. Our methods are based on concentration inequalities and extend to 
the general setting of measures on a Polish space. Deviation bounds for the 
occupation measure of a contracting Markov chain in Wi distance are also 
given. 

Throughout the text, several examples are worked out, including the cases 
of Gaussian measures on separable Banach spaces, and laws of diffusion pro- 
cesses. 
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1. Introduction 

1.1. Generalities. In the whole paper, {E, d) will denote a Polish space with met- 
ric d, equipped with its Borel cr-field and P{E) will denote the set of probabil- 
ity measures over E. Consider fj, € ViE) and a sequence of i.i.d. variables Xi, 
I < i < n, with common law /i. Let 
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(1) in=-E'^-> 

i=l 

denote the empirical measure associated with the i.i.d. sample (Xi)i<i<„, then 
with probability 1, L„ ^ /i as n — >■ +oo (here the arrow denotes narrow con- 
vergence, or convergence against all bounded continuous functions over E). This 
theorem is known as the empirical law of large number or Glivenko-Cantelli theorem 
and is due in this form to Varadarajan [31J. Quantifying the speed of convergence 
for an appropriate notion of distance between probability measures is an old prob- 
lem, with notable importance in statistics. For many examples, we refer to the 
book of Van der Vaart and Wellner [3D] and the Saint-Flour course of P.Massart 

Our aim here is to study non-asymptotic deviations in 1-Wasserstein distance. 
This is a problem of interest in the fields of statistics and numerical probability. 
More specifically, we provide bounds for the quantity P(tyi(i„, /i) > t) for t > 0, i.e. 
we quantify the speed of convergence of the variable Wi (L„ , /i) to in probability. 

This paper seeks to complement the work of F.Bolley, A.Guillin and C.Villani 
in [7] where such estimates are obtained for measures supported in W^. We sum 
up (part of) their result here. Suppose that /i is a probability measure on for 
1 < p < 2 that satisfies a Tp(C) transportation-entropy inequality, that is 

Wp{i^, fi) < ^CH{u\ii) for aU v e Vp{W^) 
(see below for definitions). They obtain a non- asymptotic Gaussian deviation 
estimate for the p— Wasserstein distance between the empirical and true measures : 

P(VFp(L„,/^) >t)< C{t)exp{~Knt^). 

This is an effective result : the constants K and C{t) may be explicitely com- 
puted from the value of some square-exponential moment of fi and the constant C 
appearing in the transportation inequality. 

The strategy used in [7] relies on a non-asymptotic version of (the upper bound 
in) Sanov's theorem. Roughly speaking, Sanov's theorem states that the proper 
rate function for the deviations of empirical measures is the entropy functional, or 
in other words that for 'good' subsets A € 'P{E), 

F{Ln e A) X e-"-f^(^l'') 

where H{A\^) — inf^^A H{i'\iJ.) (see [10_, for a full statement of the theorem). 

In a companion work [S], we derive sharper bounds for this problem, using a 
construction originally due to R.M. Dudley [T^. The interested reader may refer 
to [S] for a summary of existing results. Here, our purpose is to show that in the 
case p = 1, the results of can be recovered with simple arguments of measure 
concentration, and to give various extensions of interest. 

• We would like to consider spaces more general than M.'^. 

• We would like to encompass a wide class of measures in a synthetic treat- 
ment. In order to do so we will consider more general transportation in- 
equalities, see below. 

• Another interesting feature is to extend the result to dependent sequences 
such as the occupation measure of a Markov chain. This is a particularly 
desirable feature in applications : one may wish to approximate a distribu- 
tion that is unknown, or from which it is practically impossible to sample 
uniformly, but that is known to be the invariant measure of a simulable 
Markov chain. 
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In the remainder of this section, we introduce the tools necessary in our frame- 
work : transportation distances and transportation-entropy inequalities. In Section 
[51 we give our main results, as well as explicit estimates in several relevant cases. 
Section 13] is devoted to the proof of the main result. Section 2] is devoted to the 
proof of Theorem 12.51 In Section [S] we show how our strategy of proof can extend 
to the dependent case. 

1.2. A short introduction to transportation inequalities. 

1.2.1. Transportation costs and Wasserstein distances. We recall here basic defini- 
tions and propositions ; for proofs and a thorough account of this rich theory, the 
reader may refer to [32] ■ Define Vp, 1 < p < +oo, as the set of probability measures 
with a finite p-th moment. The p- Wasserstein metric Wp{fi, v) between fi.v £ Vp 
is defined by 



where the infimum is on probability measures tt G 'P{E x E) with marginals 
fi and ly. The topology induced by this metric is slightly stronger than the weak 
topology : namely, convergence of a sequence (/i„)„gN to a measure ii G Vp in the 
p- Wasserstein metric is equivalent to the weak convergence of the sequence plus a 
uniform bound on the p-th moments of the measures /i„, rt S N. 

We also recall the well-known Kantorovich-Rubinstein dual characterization of 
Wi : let T denote the set of 1-Lipschitz functions f : E ^ M. that vanish at some 
fixed point xq. We have : 



1.2.2. Transportation- entropy inequalities. For a very complete overview of the sub- 
ject, the reader is invited to consult the review [TS]. More facts and criteria are 
gathered in Appendix [^ For ^, G 'PiE), define the relative entropy as 



if v is absolutely continuous relatively to /i, and = -|-oo otherwise. Let 

a : [0, +oo) — >■ M denote a convex, increasing, left-continous function such that 



Definition 1.1. We say that G Vp{E) satisfies a Tp(C) inequality for some 
C > if for all v G Vp[E), 




(2) 





a(0) = 0. 



(3) 



We say that fi G V{E) satisfies a a{Td) inequality if for all v G V{E), 



(4) 



a{Wi{fi,,^))<H{,y\fi). 



Observe that Ti(C) inequalities are particular cases of a{Td) inequalities with 
a{t) ~ ^t^^P. From here on, our focus will be on a(7d) inequalities. 



4 



EMMANUEL BOISSARD 



2. Results and applications 

2.1. General bounds in the independent case. Let us first introduce some 
notation : ii K C E is compact and xq G K, we define tlie set J^k of 1-Lipschitz 
functions over K vanisliing at xq, wliicli is is also compact w.r.t. the uniform 
distance (as a consequence of the Ascoli-Arzela theorem). We will also need the 
following definition : 

Definition 2.1. Let {A,d) be a totally bounded metric space. For every 6 > 0, 
define the covering number M [A, 5) of order 5 for A as the minimal number of balls 
of radius 5 needed to cover A. 

We state our first result in a fairly general fashion. 

Theorem 2.2. Suppose that fi G 'P{E) satisfies a a(Td) inequality. Let a > be 
such that Ea^i = J e°-'^^^°'^^ iJi{dx) < 2. Choose a compact K d E such that 



32 , 32 
— log — 

at at 



32 

at 



Denote 



C5) 



Ct^U{J^K,t/8). 



We have 



(6) F{Wi (i„, n)>t)< cxp -na [t/2 - r{Ct,n)] 

where T{Ct,n) ~ infA>o l/A[logCt + na® (X/n)], and with the convention that 
a{x) — for X < 0. 

Remark. With a mild change in the proof, one may replace in © the term t/2 by 
ct for any c < 1, with the trade-off of choosing a larger compact set, and thus a 
larger value of Ct- For the sake of readability we do not make further mention of 
this. 

The result in its general form is abtruse, but it yields interesting results as soon 
as one knows more about a. Let us give a few examples. 

Corollary 2.3. ///i satisfies Ti(C), we have 

P{Wi{Ln,fi)>t)<Ctexp--^nt\ 
Corollary 2.4. Suppose that fi verifies the modified transport inequality 

( as observed in paragraph \A.0.1[ this is equivalent to the finiteness of an expo- 
nential moment for fj,). Then, for t < C/2, 



F{Wi{Ln,n) >t)< A{n,t)exp- 



2C2 



whe 



A{n, t) — exp 



( observe that A{n, t) — > Ct when n — > +oo ). 



logCt 



-1) 
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Proof of Corollary \ 2. S[ In this case, we have a{t) — -kt^ , and so 



SO that we get 



n 

2 \ 1 „2 



and conclude with the elementary inequality (a — b) > -^a 



□ 



Proof of Corollary \2.4\ Here, a(a:) = ily 1 + ^ -'^)^' ^-'^'^ '^^'^ ^-'^^ bound 

r(Ct,iv)< ^ 



1 + ^^-1 



By concavity of the square root function, for u < 1, we have ^1 + u — 1 > 
{^/2- l)u. Thus, for f < f , we have 



2 V // - 4 /i , jv 

log Ct ^ 



(in the last line we have used again the inequality {a ~ b)^ > ^ — b^). This in 
turn gives the announced result. 

□ 

Our technique of proof, though related to the one in [7], is based on different ar- 
guments : we make use of the tensorization properties of transportation inequalities 
as well as the estimates (QUI) in the spirit of Bobkov-Gotze, instead of a Sanov-type 
bound. The notion that is key here is the phenomenon of concentration of measure 
(see e.g. [H]) : its relevance in statistics was put forth very explicitely in |24| . 
We may sum up our approach as follows : first, we rely on existing tensorization 
results to obtain concentration of Wi{Ln, /i) around its mean E[Wi(i„, /i)], and in 
a second time we estimate the decay of the mean as n — >■ +cxd. Despite technical 
difficulties, the arguments are mostly elementary. 

The next theorem is a variation on Corollary [231 Its proof is based on different 
arguments, and it is postponed to Section S) We will use this theorem to obtain 
bounds for Gaussian measures in Theorem 12.51 

Theorem 2.5. Let /i e VlE) satisfy a Ti(C) inequality. Then : 

nWi{l^,Lr,)>t)<Kte~'''"/''^ 



where 

" 1 



Kt = exp 



— inf Card (Supp v){Diam Supp v)"^ 
C v 



and V runs over all probability measures with finite support such that Wi(/i, i') < 
t/4. 

Remark. As earlier, we could improve the factor 1/8C in the statement above to 
any constant c < 1/C, with the trade-off of a larger constant Kt. 
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2.2. Comments. We give some comments on the pertinence of the resuhs above. 
First of aU, we argue that the asymptotic order of magnitude of our estimates is 
the correct one. The term "asymptotic" here means that we consider the regime 
n — ^ +00, and the relevant tool in this setting is Sanov's large deviation principle 
for empirical measures. A technical point needs to be stressed : there are several 
variations of Sanov's theorem, and the most common ones (see e.g. |10] ) deal with 
the weak topology on probability measures. What we require is a version of the 
principle that holds for the stronger topology induced by the 1-Wasserstein metric, 
which leads to slightly more stringent assumptions on the measure than in Theorem 
12.21 With this in mind, we quote the following result from Wang [33J : 

Proposition 2.6. Suppose that fi <E V{E) satisfies J e'^'^^^'^°^ fi{dx) < +00 for all 
a > and some xq e E, and a a{Td) inequality. Then : 

• for all A C V{E) closed for the Wi topology, 

limsup — log/^(A) < — inf Hivlfi) 

• for all B C 'P{E) open w.r.t. the Wi topology, 

liminf — log > — inf H{i'\fi). 

Consider the closed set A ~ {ly E 'P{E), W^i(/i, v) > t}, then we have according 
to the above 

limsup-logP(VFi(L„,/x) > t) < -a{t). 
With Theorem l2.2l (and the remark following it), we obtain the bound 

limsup-logP(VFi(L„,^) > t) < -a{ct) 

n— ^+00 ^ 

for all c < 1, and since a is left-continuous, we indeed obtain the same asymptotic 
bound as from Sanov's theorem. 

Let us come back to the non-asymptotic regime. When we assume for example a 
Ti inequality, we get a bound in the form P(W^i(L„, /i) > t) < C(t)e~'^"'' involving 
the large constant C(t). By the Kantorovich-Rubinstein dual formulation of Wi, 
this amounts to simultaneous deviation inequalities for all 1-Lipschitz observables. 
We recall briefly the well-known fact that it is fairly easy to obtain a deviation 
inequality for one Lipschitz observable without a constant depending on the devi- 
ation scale t. Indeed, consider a 1-Lipschitz function / and a sequence Xi of i.i.d. 
variables with law /i. By Chebyshev's bound, for 9 > 0, 

n^Y. /(^') - / > £) < exp ~n[9e - log( J ^ iJL{dx)e-' f f^)] 

According to Bobkov-Gotze's dual characterization of Ti, the term inside the 
log is bounded above by e*^^ , for some positive C, whence P(i J2 fi^i) — J ft^-^ 
e) < exp— n[6'£ — C9^]. Finally, take 9 ~ to get 

Thus, we may see the multiplicative constant that we obtain as a trade-off for 
the obtention of uniform deviation estimates on all Lipschitz observables. 
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2.3. Examples of application. For practical purposes, it is important to give tlie 
order of magnitude of the multiplicative constant Ct depending on t. We address 
this question on several important examples in this paragraph. 

2.3.1. The M'* case. 

Example 2.7. Denote e'(a;) = 32a; log [2 (32a; log 322; - 322; + 1)]. In the case E = 
R'', the numerical constant Ct appearing in Theorem \2.'J\ satisfies : 



(7) 



Cf < 2 



at 



2 at 



where Cd only depends on d. In particular, for all t < there exist numerical 



constants Ci and C2 such that 



1 1 C'dC^ i- log -)'^ 
Ct <Ci(l + — log— )e ^^at at' . 
at at 

Remark. The constants Cd^ C'l, C2 may be explicitely determined from the proof. 
We do not do so and only state that Cd grows exponentially with d. 

Proof. For a measure /i S V{M.'^), a convenient natural choice for a compact set of 
large measure is a Euclidean ball. Denote Bji = {x G M.'^, \x\ < R}. We will denote 
by Cd a constant depending only on the dimension d, that may change from line 
to line. Suppose that n satisfies the assumptions in Theorem 12.21 By Chebyshev's 
bound, fJ.{B'f^) < 2e~°^, so we may choose K = Bn^ with 



Rt>- log 



32 , 32 32 
— log 

at at at 



1 



Next, the covering numbers for Bji are bounded by : 



N{BR,5)<Cd^-^ 
Using the bound ([^^ of Proposition IB.2[ we have 



R 



Ct< 2 



32^ 

t 



Cd 



i2Rt 
t 



This concludes the proof for the first part of the proposition. The second claim 
derives from the fact that for x > 2, there exists a numerical constant k such that 
0{x) < kxlogx. 

□ 

Example 12.71 improves slightly upon the result for the Wi metric in [7] . One 
may wonder whether this order of magnitude is close to optimality. It is in fact not 
sharp, and we point out where better results may be found. 

In the case d — 1, iyi(L„,/i) is bounded above by the Kolmogorov-Smirnov 
divergence sup^jg^ \Fn{x) — F{x)\ where Fn and F denote respectively the cumu- 
lative distribution functions (c.d.f.) of L„ and fi. As a consequence of the cele- 
brated Dvorestky-Kiefer-Wolfowitz theorem (see [551, [30] )i we have the following : 
if /i e 'P{M.) has a continuous c.d.f., then 



'{Wi{L„, fj.) > t) < 2e- 
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The behaviour of the Wasserstein distance between empirical and true distri- 
bution in one dimension has been very thoroughly studied by del Barrio, Gine, 
Matran, see [5]. 

In dimensions greater than 1, the result is also not sharp. Integrating ([T]), one 
recovers a bound of the type E(Wi(i„,/i)) < Cri^^/'''+^^(logn)'^. Looking into 
the proof of our main result, one sees that any improvement of this bound will 
automatically give a sharper result than ((T]) . For the uniform measure over the unit 
cube, results have been known for a while. The pioneering work in this framework 
is the celebrated article of Ajtai, Komlos and Tusnady [T]. M.Talagrand f5S] showed 
that when fi is the uniform distribution on the unit cube (in which case it clearly 
satisfies a Ti inequality) and d > 3, there exists Cd < Cd such that 

Cdn-i/'^ < EM^i(i„,/i) < Cdu-^/''. 
Sharp results for general measures are much more recent : as a consequence of 
the results of F. Barthe and C. Bordenave [3], one may get an estimate of the type 
EWi{L,i,fi) < cn'^l'^ under some polynomial moment condition on [i. 

2.3.2. A first bound for Standard Brownian motion. We wish now to illustrate our 
results on an infinite-dimensional case. A first natural candidate is the law of the 
standard Brownian motion, with the sup-norm as reference metric. The natural 
idea that we put in place in this paragraph is to choose as large compact sets the 
a- Holder balls, which are compact for the sup-norm. However the remainder of this 
paragraph serves mainly an illustrative purpose : we will obtain sharper results, 
valid for general Gaussian measures on (separable) Banach spaces, in paragraph 

We consider the canonical Wiener space (C([0, 1],]R),7, ||.||oc), where 7 denotes 
the Wiener measure, under which the coordinate process Bt : w — > w(t) is a standard 
Brownian motion. 

Example 2.8. Denote by the Wiener measure on (C([0, 1],M),7, ||.||oo), and for 
a < 1/2, define 

= 2 _ 24/(l-2a) Il^ll4/(l-2a) 

where \\Z\\p denotes the Lp norm of aj\f{0,l) variable Z. There exists k > 
such that for every t < 144/ v^2 log 2, 7 satisfies 

P(iyi(L„,7)>i)<Cte-"*'/64 

with 



Ct<expexp(/fcC„^i^)i/". 
Proof. For < a < 1, define the a-Holder semi-norm as 

|.U= sup M!)^. 

Let < a < 1 and denote by Ca the Banach space of a-H61der continuous 
functions vanishing at 0, endowed with the norm ||.||a- It is a classical fact that 
the Wiener measure is concentrated on Ca for all a s]0, l/2[. By Ascoli-Arzela's 
theorem, Ca is compactly embedded in C([0, 1], M), or in other words the a- Holder 
balls Ba^R — {x £ C([0, 1],R), ||x||q, < R} are totally bounded for the uniform 
norm. This makes B{a,R) good candidates for compact spaces of large measure. 
We need to evaluate how big B{a, R) is w.r.t. 7. 
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To this end we use the fact that the Wiener measure is also a Gaussian measure 



on Cq (see [5]). Therefore Lemma FD . 1 1 apphes : denote 

= Esup \\Bt\\a, si = E(sup ||Bt|U)^ 



we have 

for R > ma- Choosing 

(8) Rt > rria 

guarantees that 



24log2(— log— - — + 1) 
at at at 



1/2 



,«a,«.r)<(f.ogf-f + l 

On the other hand, according to Corollarv lC.21 ma and Sq. are bounded by Cq. 
And Lemma ID31 shows that choosing a = ^2 log 2/3 ensures Ee"™P* 1-^*1 < 2. 
Elementary computations show that for t < 144/^/2 log 2, we can pick 



Rt = 3C«Vlog(96/(V21og2t)) 
to comply with the requirement in 

Bounds for the covering numbers in a-Holder balls are computed in [B] : 



(9) 



Af(B(a,R),S) < 10-exp 





log(3)5^ 



We recover the (unpretty !) bound 



Ct < 2(1 + 96^ 0og 96/(^2 log 2t)) 



exp 



240 log 2 — A/log96/(\/21og2t) 



X explog3 ( 120^ y log 96/(^2 log 2t) 



l/a' 



The final claim in the Proposition is obtained by elementary majorizations. 



□ 



2.3.3. Paths of S.D.E.s. H.Djellout, A.Guillin and L.Wu estabhshed a Ti inequal- 
ity for paths of S.D.E.s that allows us to work as in the case of Brownian motion. 
We quote their result from |17) . 
Consider the S.D.E. on 



(10) dXt = b{Xt)dt + a{Xt)dBt, Xa = xq e M'* 

with 6 : K'' ^ K^', a : R"^ ^ Mdxm and [Bt) is a standard m-dimensional 
Brownian motion. We assume that b and a are locally Lipschitz and that for all 

sup I ^/tra{xyaix)\ < A, {y - x, b{y) - b{x)) < 5(1 + |y - .t^) 

X 

For each starting point x it has a unique non-explosive solution denoted {Xt{x)t>o 
and we denote its law on C([0, 1], M'*) by P^. 
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Theorem 2.9 ([ITj). Assume the conditions above. There exists C depending on 
A and B only such that for every x G , satisfies a Ti(C) inequality on the 
space C([0, 1],R'') endowed with the sup-norm. 

We will now state our result. A word of caution : in order to balance readability, 
the following computations are neither optimized nor made fully explicit. However 
it should be a simple, though dull, task for the reader to track the dependence of 
the numerical constants on the parameters. 

From now on we make the simplifying assumption that the drift coefhcient is 
globally bounded by B (this assumption is certainly not minimal) . 

Example 2.10. Let /i denote the law of the solution of the S.D.E. mU\) on the 
Banach space C([0, 1], IR.'*) endowed with the sup-norm. Let C be such that /i satisfies 
Ti(C). For all < a < 1/2 there exist Ca and c depending only on A, B, a and 
d, and such that for t < c, 

P(Wi(L„,/i)>i)<Cte-"*'/«^ 

and 



Ct < exp exp 



l + l/2a /j^^ -1+3/2q' 



Proof. The proof goes along the same lines as the Brownian motion case, so we 
only outline the important steps. First, there exists a depending explicitely on 
A, B, d such that E-p^e""'^- 1'°° < 2 : this can be seen by checking that the proof 
of Djellout-Guillin-Wu actually gives the value of a Gaussian moment for as a 
function of A, B, d, and using standard bounds. 

CoroUarv IC.3I applies for a < 1/2 and p such that 1/p — 1/2 — a : there 
exists C" < +0O depending explicitely on A, B, a, d, such that E||X ||^ < C. 
Consequently, 



So choosing 



guarantees that 



li{B{a,Ry) < C'/RP. 



„ ,^,,32, 32 32 ^.^^^P 

log — - — + 1) 

at at at 



,(B«..«,n<(H,„,|_| + i 

For t < c small enough, R < C" (j log 7)^^^ with c, C" depending on A, B, a, 
d. The conclusion is reached again by using estimate ® on the covering numbers 
of Holder balls. 

□ 



2.3.4. Gaussian r.v.s in Banach spaces. In this paragraph we apply Theorem [ 
to the case where i? is a separable Banach space with norm ||.||, and /i is a centered 
Gaussian random variable with values in i?, meaning that the image of fi by every 
continuous linear functional / € E* is a centered Gaussian variable in R. The 
couple (i?, fi) is said to be a Gaussian Banach space. 

Let X be a E'- valued r.v. with law /x, and define the weak variance of fi as 



sup (E/2(X)) 
/eBM/l<i 



1/2 
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The small ball function of a Gaussian Banach space {E, /i) is the function 

V'(i) = -logAt(B(0,i))- 
We can associate to the couple {E, /i) their Cameron-Martin Hilbert space H C 
E, see e.g. |20J for a reference. It is known that the small ball function has 
deep links with the covering numbers of the unit ball of see e.g. Kuelbs-Li 
[I95 and Li-Linde [22], as well as with the approximation of /i by measures with 
finite support in Wasserstein distance (the quantization or optimal quantization 
problem), see Fehringer's Ph.D. thesis [13J, Dereich-Fehringer-Matoussi-Scheutzow 
Graf-Luschgy-Pages [IB]. It should thus come as no surprise that we can give 
a bound on the constant Kt depending solely on ip and a. This is the content of 
the next example. 

Example 2.11. Let {E, fi) be a Gaussian Banach space. Denote by ip its small ball 
function and by a its weak variance. Assume that t is such that tl){t/\Q) > log 2 
and t/a < 8^2 log 2. Then 

P(VFi(L„,M)>t)<i^te-"*'/i6.' 

with 

Kt = exp exp [c{ij{t/32) + log(cr/i))] 
for some universal constant c. 

A bound for c may be tracked in the proof. 

Proof. Step 1. Building an approximating measure of finite support. 

Denote by K the unit ball of the Cameron-Martin space associated to E and fi, 
and by B the unit ball of E. According to the Gaussian isoperimetric inequality 
(see [ID]), for all A > and e > 0, 

^i{XK + eS) > $ (A + ^-\^i{eB))) 
where <i>(t) = e~^^ I'^duj \f2i is the Gaussian c.d.f.. Note 

, 1 

^(Aa -f sB) 

the restriction of /i to the enlarged ball. As proved in [5], Appendix 1, the 
Gaussian measure /i satisfies a T2(2cr^) inequality, hence a Ti inequality with the 
same constant. We have 



W^i(m,m') < V2^2^(/^'|/i) = v/-2fT2 1og/i(Ai<r + eB) 
< V-2ct2 log$(A + $-i(Me5))). 

On the other hand, denote fc = M{\K^ e) the covering number of \K (w.r.t. the 
norm of E). Let xi, . . . , ccfc G -ftT be such that union of the balls B{xi,e) contains 
\K. From the triangle inequality we get the inclusion 

fc 

\K + eB ^[j B{x,,2e). 

i=l 

Choose a measurable map T : XK + eB -> {xi, . . . ,Xk} such that for all x, 
\x — T{x)\ < 2e. The push-forward measure fi'^ — T^fi' has support in the finite 
set {xi, . . . , Xfc}, and clearly 
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Wi{fj,',fi'')<2e. 

Choose e — t/16, and 

(11) A = $-i(e-*'/(i2«'^')) - <p-\^{eB)) 

(12) = T"^(e"'''(*/^'')) + $-i(e~*'/(i28cr")) 

where T{t) = /+°° e-"'/2du/V2^ is the tail of the Gaussian distribution (we 
have used the fact that $^^+T^^ = 0, which comes from symmetry of the Gaussian 
distribution) . 

Altogether, this ensures that Wi(p,pi'^) < t/4. 

Step 2. Bounding A. 

We can use the elementary bound T{t) < e~* t > to get 



T'^{u) < v/-21ogu, 0<u<l/2 
which yields T-^{e-'^'^*/'^^'>) < ^/ipit/lG) as soon as -0(i/16) > log2. Likewise, 

$-i(e-*'/i28.^) = T-i(l - e-*'/i28a=) 

as soon as ^2/128(7^ < log 2. Moreover, for u < log 2, we have 1/(1 - e"") < 
21og2/M. Putting everything together, we get 



(13) 



A < y^'tP{t/l6) + cy^loga/t 



for some universal constant c > 0. Observe that the first term in will usually 
be much larger than the second one. 
Step 3. 

From Theorem 12.51 we know that 



P(W^2(Ai,i„)>t)<i^te-"*'/iS"' 



ith 



Kt = exp 



1 k 

— -(Diam {xi,...,Xk}f 



The diameter is bounded by Diam K — 2a\ < c(j(y/'i/j{t/ 16) + c^/\oga/i). 

We wish now to control k = J\f{XK, t/16) in terms of the small ball function ip- 
The two quantities are known to be connected : for example. Lemma 1 in [19] gives 
the bound 



Af{XK,e) < e^V2+V(e/2)^ 



Thus 



k < exp[ip{t/16) + 'ilj{t/32) + cloga/t] . 
With some elementary majorizations, this ends the proof. 

□ 
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We can now sharpen the results of Proposition 12.81 Let 7 denote the Wiener 
measure on C([0, 1],R'*) endowed with the sup-norm, and denote by cP' its weak 
variance. Let Ai be the first nonzero eigenvalue of the Laplacian operator on the 
ball of R"^ with homogeneous Dirichlet boundary conditions : it is well-known that 
the small ball function for the Brownian motion on R"^ is equivalent to \\jt^ when 
t — !■ +CX). for t small enough. 

As a consequence, there exists C = G{d) such that for small enough t > we 
have 



(14) Wi(L„,7)<expexp[CAiA2]e- 



2.4. Bounds in the dependent case : occupation measures of contractive 
Markov chains. The results above can be extended to the convergence of the 
occupation measure for a Markov process. As an example, we establish the following 
result. 

Theorem 2.12. Let P{x,dy) be a Markov kernel on M.'^ such that 

(1) the measures P{x, .) satisfy a Ti(C) inequality 

(2) Wi{P{x, .), .)) < r\x - y\ for some r <l. 

Let TT denote its invariant measure. Let {Xi)i>Q denote the Markov chain asso- 
ciated with P under Xq = 0. 

Set a ^ ^ ^■\/4m^ + C log 2 — 2mij . There exists Cd > depending only on d 
such that for t < 2/ a, 



\WiiL„,n) >t)< K{n,t) exp-n ^^^J^' t^ 



whe 



K{n, t) = exp 



+ Cd(-log-): 

'n' 



C at at 



Remark. The result is close to the one obtained in the independent case, and, as 
stressed in the introduction, it holds interest from the perspective of numerical 
simulation, in cases where one cannot sample uniformly from a given probability 
distribution tt but may build a Markov chain that admits tt as its invariant measure. 

Remark. We comment on the assumptions on the transition kernel. The first one 
ensures that the Ti inequality is propagated to the laws of Ar„, n > 1. As for 
the second one, it has appeared several times in the Markov chain literature (see 
e.g. [T7|, [HI) as a particular variant of the Dobrushin uniqueness condition 
for Gibbs measures. It has a nice geometric interpretation as a positive lower 
bound on the Ricci curvature of the Markov chain, put forward for example in 
[26]. Heuristically. this condition implies that the Markov chains started from two 
different points and suitably coupled tend to get closer. 



3. Proof of Theorem 12.21 

The starting point is the following result, obtained by Gozlan and Leonard f|14|. 
see Chapter 6) by studying the tensorization properties of transportation inequali- 
ties. 

Lemma 3.1. Suppose that /i G P{E1) verifies a a(Td) inequality. Define on E"' the 
metric 
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d®"{{xi, . . .,Xn), {yi, ■ ■ ■,yn)) = ^d{Xt,yi). 

i=l 

Then /i®" €E 'P{E'^) verifies a a' {Tdem) inequality, where a'{t) = ^a(nt). Hence, 
for all Lipschitz functionals Z : E" — )■ M. (w.r.t. the distance d®"), we have the 
concentration inequality 

lj,'^"{Z > [ Zdn'^" +t)< exp -na{—r^. — ) for all t > 0. 
J n\\Z\\Lip 

Let Xi be an i.i.d. sample of fj,. Recalling that 



and that 



1 " r 
Wi{K,n)= sup -Y^fix,)- fdn 
/i-Lip n ^ J 



1 " f 
„) H> sup / fdfl 

/1-LiP ^ J 



(^X\ , . . . , X 

is i-Lipschitz w.r.t. the distance d®" on (as a supremum of ^-Lipschitz 
functions), the following ensues : 

(15) P(W^i(L„,m) > E[Wi(L„,/i)] +i) < exp-na(i). 

Therefore, we are led to seek a control on E[W^i(L„, /x)]. This is what we do in 
the next lemma. 

Lemma 3.2. Let a > be such that Ea,i = J e'"^'^''''"'^ fi{dx) < 2. 

Let S > and K G E be a compact subset containing xq . Let Ms denote the 
covering number of order 5 for the set Tk of 1-Lipschitz functions on K vanishing 
at xo (endowed with the uniform distance). 

Also define u : [0, +oo) [1, +oo) as the inverse function o/xi-^xlnx — x + 1 
on [1, +oo). 

The following holds : 

£[1^1 (£„,/.)]< 2^ + 8- ^ \ +T{Ms,n) 

where 

T{Afs,n) = inf hog Afs + na* {-)]. 

A>o A n 

Proof. We denote by T the set of 1-Lipschitz functions / over E such that /(xq) = 
0. Let us denote 

*(/) = Jfdi.i-J fdLr,, 

we have ioi f,g G T : 

|*(/) - *(5)I < y 1/ - gl^Kdfl + J\f- 9\^KdLn 

+ j (I/I + \9\)lKcdlJi + J (I/I + \g\)lKcdLr. 

< 2||/ - 3||ioo(/f) + 2 j d(x,xo)lifc(i/x + 2 j d(x, xo)lifc(iL„ 
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When / : _E — > M is a measurable function, denote by f\K its restriction to K. 
Notice that for every g E Fk, there exists f E such that f\K — <?■ Indeed, one 
may set 



I g{x) a X e K 

I iniyt^K f{y) + d{x, y) otherwise 
and check that / is 1-Lipschitz over E. 

By definition of Afs, there exist functions gi, . . . ,gj\f^ G Fk such that the balls 
of center gi and radius 5 (for the uniform distance) cover Tk- We can extend these 
functions to functions fi S as noted above. 

Consider f E J- and choose fi such that \ f — fi\ < 5 on K : 



*(/) < + 

< «'(/0 + 25 + 2 / d(a;,a;o)lA'<=d/i + 2 / d{x,XQ)lK-dLr. 



< 



max ^'(/j) + 25 + 2 / d{x, xo)lK'=diJ. + 2 / d{x, xo)lK'=dLn 



The right-hand side in the last line does not depend on /, so it is also greater 
than Wi{Ln,ti) = sup^^'(/). 

We pass to expectations, and bound the terms on the right. We use Orlicz- 
Holder's inequality with the pair of conjugate Young functions 



t(x) 



I if x < 1 

I X log X — X + 1 otherwise 
T*{x) - 1 



(for definitions and a proof of Orlicz-Holder's inequality, the reader may refer to 
[m. Chapter 10). We get 



d{x,Xo)lK-^d^l < 2\\lKo\\r\\d{x,XQ)\\r* 



where 



and 



|lKHIr-inf{0>O, |T[^]d^l<l} 



\\d{x, xo)\\r' = inf{6' > 0, 



d(x.xo) 

e - 1 



dn < 1}. 



It is easily seen that ||1a'<:||t = 1/o'(1/m(-^'^))- And we assumed that a is such 
that Ea.i — / exp ac?(a;, xo)(i/i < 2, so ||(i(x, xo)||t» < 1/a. Altogether, this yields 



d(x, xo)l/f<:d/i < 2 



1 1 



Also, if Xi, . . . , Xn are i.i.d. variables of law fi, 

E[[ d{x,xo)lK^dL„]=E[d{Xi,xo)lK4Xi)] < -^TT^ 



as seen above. Putting this together yields the inequality 

1 



E[T4^i(i„,^)] <2(5 + 



, , , xT+]E[ max *(f,)l 
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The remaining term can be bounded by a form of maximal inequality. First fix 
some i and A > : we have 

E[expA*(/,)] = E[cxp- V(/(X,)- //d/.)] 

= (E[exp-(/(Xi)- / fdfiW 
n J 

< gna®(A/n)^ 

In the last line, we have used estimate pop. Using Jensen's inequality, we may 
then write 

E[ max *(/,)] < ^logE[ max expA*(/,)] 



< ilogX;E[expAvI/(/,)] 

i=i 

< h\ogUs+na*{-)] 
A n 



So minimizing in A we have 



E[. max ^{f,)]<TiAfs,n). 
j=l,...,Ms 

Bringing it all together finishes the proof of the lemma. 



We can now finish the proof of Theorem 



□ 



Proof. Come back to the deviation bound p3)) . Choose 5 = t/8, and choose K 
such that 



We thus have 26 + 8[aa{l/ niK"))]-'^ < i/2, which implies 



32, 32 32 

— log hi 

at at at 



(16) EiWl{Ln,^i)<t/2 + T{Cun) 

and so 



''{Wl{Ln,^l) >t)< exp~na{^-TiAfs,n)), 



with the convention a{y) = if ?/ < 0. 



□ 



4. Proof of Theorem 12.51 

In this section, we provide a different approach to our result in the independent 
case. As earlier we first aim to get a bound on the speed of convergence on the av- 
erage Wi distance between empirical and true measure. The lemma below provides 
another way to obtain such an estimate. 
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Lemma 4.1. Let fj!' G 'P{E) he a finitely supported measure such that \Supp iJ^\ < 
k. Let D{^^) — Diam Supp he the diameter of Supp . The following holds : 

¥Wl{^l,Lr,) <2Wl{^JL, ^i"") + D{^l'')^kJ^. 

Proof. Let TTopt be an optimal coupling of /i and /^'^ (it exists : see e.g. Theorem 
4.1 in [32]), and let {Xi,Yi), 1 < i < n, be i.i.d. variables on E x E with common 

law TTopt. 

Let Ln = and Lj^^ = l/"X]"=i ^Yi- By the triangle inequality, we 
have 

W^l (in , < 1^1 (in , ) + 1^1 (M, ) + W^l (m' , ) • 

With our choice of coupling for L„ and ij; it is easily seen that 

EM^i(L„,Lf;) < 

Let us take care of the last term. We use Lemma 14.21 below to obtain that 



k 

= i?(//) ^E(/(x,) - ^i\x,) A Li{x,)) 

i=l 
k 

< D{^i^)Y,E\^i\x,) - Li(x,)\ 



n 

i=l 



Observe that the variables nL\[xi) follow binomial laws with parameter \^(xi) 
and n. We get : 



n — ' » 



(the last inequality being a consequence of the Cauchy-Schwarz inequality). 

□ 

Lemma 4.2. Let fi, v he prohahility measures with support in a finite metric space 
{xi, . . . ,Xk\ of diameter hounded by D . Then 



Wl{^x,v)<D[l-Y, (^(a;^) A v{x,)) 



Proof. We build a coupling of /i and u that leaves as much mass in place as possible, 
in the following fashion : set f(xi) ~ ^(xi)Ai'(xi) and A ~ "Y^^ fi- Set q{xi) ~ fi/X, 
and define the measures 
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Finally, build independent random variables Xi ^ /ii, Yi ^ vi, Z ^ q and B 
with Bernoulli law of parameter A. Define 

X = {1- B)Xi +BZ, Y = (1- B)Yi + BZ. 
It is an easy check that X ~ /it, F ~ i^. 
Thus we have the bound 

Wi{ii,iy) < E\X - Y\ 

= (1 - A)E|Xi - Yil < - A) 
and this concludes the proof. 

□ 

Proof of Theorem \2.5\ As stated earlier, we have the concentration bound 

The proof is concluded by arguments similar to the ones used before, calling 
upon Lemma |4. II to bound the mean. 

□ 

5. Proofs in the dependent case 

Before proving Theorem l2.12[ we establish a more general result in the spirit of 
Lemma 13.21 

As earlier, the first ingredient we need to apply our strategy of proof is a 
tensorization property for the transport-entropy inequalities in the case of non- 
independent sequences. To this end, we restate results from [I7| , where only Ti 
inequalities were investigated, in our framework. 

For X = {xi, . . . , Xn) G E^, and 1 < i < n, denote = {xi, . . . ,Xi). En- 
dow with the distance di{x,y) = J27i d{xi,yi). Let e V{E"), the notation 
i'^{dxi, . . . ,dxi) stands for the marginal measure on E^, and stands for 

the regular conditional law of Xi knowing x''~^, or in other words the conditional 
disintegration of with respect to at a;^~^(its existence is assumed through- 
out). 

The next theorem is a slight extension of Theorem 2.11 in [17 . Its proof can be 
adapted without difficulty, and we omit it here. 

Theorem 5.1. Let v e V{E^^) he a probability measure such that 

(1) For alli>l and all e E"--^ (E^ = {xq}), v"- {.\x'-'^) satisfies a a{Td) 
inequality, and 

(2) There exists S* > such that for every 1-Lipschitz function 

f : {xk+l,...,Xn) f{xk+i, ■ . ■ ,x„), 
for all x^^^ e E^~^ and Xk,yk G E, we have 

|E, {f{Xk+i,...,Xn)\X'' - (x^-^Xk))- 
(17) (/(Xfe+i, . . . , Xr,)\X'' = ix''-\yk)) I 

< Sd{xk,yk) 

Then v verifies the transportation inequality a{Td) < H with 

a{t) = na{ ^ t). 

n{l + b) 
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In the case of a homogeneous Markov chain {Xn)neN with transition kernel 
P{x,dy), the next proposition gives sufficient conditions on the transition proba- 
bihties for the laws of the variables X„ and the path- level law of {Xi, . . . ,Xn) to 
satisfy some transportation inequalities. Once again the statement and its proof 
are adaptations of the corresponding Proposition 2.10 of |17j . 

Proposition 5.2. Let P{x,dy) be a Markov kernel such that 

(1) the transition measures P{x, .) satisfies a(Td) < H for all x Cz E, and 

(2) Wi{P{x, .), P{y., .)) < rd{x, y) for all x,y <E E and some r < 1. 

Then there exists a unique invariant probability measure tt for the Markov chain 
associated to P , and the measures P"(x, .) and tt satisfy a'(Td) < H , where a'{t) = 
Th-Ma^r)t). 

Moreover, under these hypotheses, the conditions of Theorem I5.il are verified 
with S ~ so that the law P„ of the n-uple {Xi, . . . ,X„) under Xq — xq G E 
verifies a{Td) < H where a(t) — na(^^-^t). 

Proof. The first claim is obtained exactly as in the proof of Proposition 2.10 in |17) . 
observing that the contraction condition [2] is equivalent to 

Wi {viP, V2P) < rWi {vi , V2) for aU vi,V2 <E Vi {E) 

and also to 

ll^/lkip < r||/||Lip for aU /. 

This entails that whenever / is 1-Lipschitz, P"/ is r"-Lipschitz. Now, by con- 
dition [TJ we have 

P"(e''-^) < P"-i (exp {sPf + a®{s))) 

< P"-2 ((sp2y ^ ^®(^) _^ a®{rs))) 

< . . . 

< exp ((sP"/ + a®{s) + ...+ a®(r"s))) . 

As a® is convex and vanishes at 0, we have a®{rt) < ra®{t) for all t > 0. Thus, 

P'\e'f < exp ^sP"7 + ^j'"a®{s)^ = exp (^sP'7 + jl—a®{s) 

It remains only to check that j^a® is the monotone conjugate of a' and to 
invoke Proposition IA.3I 

Moving on to the final claim, since the process is homogeneous, to ensure that 
P7)) is satisfied, we need only show that for all fc > 1, for all / : E'' — > R 1-Lipschitz 
w.r.t. di, the function 

x^E[f{Xi,...,Xk)\Xo^x] 

is Y3^-Lipschitz. We show the following : if g : i?^ — )■ M is such that for all 
Xi,X2 S E the functions g{.,X2), resp. g{xi,.), are 1-Lipschitz, resp. A-Lipschitz, 
then the function 



2^1 i-> J g{xi,X2)P{xi,dx2) 
is (1 + Ar)-Lipschitz. Indeed, 
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g{xi,X2)Pixi,dx2) - J g{yi,x2)Piyi,dx2)\ 
< / \gixi,X2) - g{yi,x2)\Pixi,dx2) 



+ 1 J 9iyi,X2){P{xi,dx2) - Piyi,dx2))\ 
< {1 + Xr)d{xi,yi). 
It follows easily by induction that the function 



fk-xi^^ J f{xi, . . . ,Xk)Pixk-i,dxk) ■ ■ ■Pixi,dx2) 

has Lipschitz norm bounded by 1 + r + . . . r''" < . Hence the function 
X I— > J fk{xi)P{x,dxi) has Lipschitz norm bounded by jri^- But this function 
is precisely 



x^E[f{Xi,...,Xk)\Xo = x] 
and the proof is complete. □ 

We are in position to prove the analogue of Lemma 13.21 in the Markov case. 

Lemma 5.3. Consider the Markov chain associated to a transition kernel P as in 
Proposition 1 5. 'A Let Pn denote the law of the Markov path {Xi, . . . , X„) associated 
to P under Xq = xq. Introduce the averaged occupation measure 7r„ = Ep^(I/„) 
and the invariant measure n. Let mi = J d(x,XQ)Tr(dx). 

Suppose that there exists a > such that for all i > 1 Ea.i — J e^^^^'^°^ P^{dx) < 

2. 

Let S > and K Cz E be a compact subset containing xq. Let Ms denote the 
metric entropy of order S for the set Tk of l-Lipschitz functions on K vanishing 
at Xq (endowed with the uniform distance). Also define a : [0, +(X)) — ^ [1,+cx)) as 
the inverse function o/a; H> xlnx — a; + 1 on [1, +oo). 

The following holds : 



EpjWi{Ln,TTn)] <2S+--y ^ \ +T{Afs,n) 

n. n. ^ — ^ rr I — ^ 1 



mi 



nil 



where T{jVs,n) = mfx>o j log A/5 + 

Proof. Convergence to the equilibrium measure is dealt with using the contraction 
hypothesis. Indeed, by convexity of the map fi 1— >■ W^i(/x,7r), we first have 

1 " 

i=l 

Now, using that the contraction property Q in Proposition 15.21 is equivalent to 
the inequality Wi{iJ,iP, IJ.2P) < rWi{fii, 112) for all /ii,/i2 G Pi{E), and using the 
fact that vr is P-invariant, 



Wi{Sxg,Tr) mi 



(1 — r) 71(1 — r) 
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In order to take care of the second term, we will use the same strategy (and 
notations) as in the independent case. Introduce once again a compact subset 
K G E and a covering of by functions /i, . . . , fj\f^ suitably extendend to E. 
With the same arguments as before, we get 

Ep„I^i(L„,7r„) <Ep„( max^ *(/j)) + 2(5 + 2 / d{x,xo)lK^dTTn 

j=l.,...,J\fs J 
+ 2Ep„( J d{x,Xo)lK':dLn) 

Then, 

I d{xo,y)Trn{dy) = - ^ f d{xo,y)lK'=P\xo,dy). 
•' " i=i 

As before we can use Orlicz-Holder's inequality to recover the bound 

/2 1 1 
d(xo,?/)rf7r„ < --V — 



And likewise, 



E( / d{x,xo)lK'=dLn) =E - y^ d{xo, Xi)Ik'^ 

L i—l 

1 " /■ 

= / d{xa,y)lK'=P"ixo,dy) 

i=i 

and we have the same bound as above. 

As for the last term remaining : it will be possible to use the maximal inequality 
techniques just as in the proof of Theorem 12.21 provided that we can find bounds 
on the terms E [exp A^(/j)], where this time 

*(/) = J fdLn - J fdn,,. 



Denote 



1 " 

Fj{xi,...,x„) = - ^^fjixi)- 

1=1 

This is a i-Lipschitz function on i?". Since P„ satisfies a a(7d) < H inequality, 
we have 



exp XFj dPn < exp 
But this is exactly the bound 



A I F,dPn+na®{ — 
J n{l 



A 



(1-0' 



E[expAvI/(/,)] <e" 



We may then proceed as in the independent case and obtain 



E[. max *(/,)] <inf - 

] = !,.. .,Ns A>0 A 



\ogUs + na®{ 



A 



n(l — r) 



□ 
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For any Lipschitz function / : — > M (w.r.t. di), we have the concentration 
inequality 

P„(xG£;", f{x)> [fdP,,+t)<exp-na(^^—p^). 

J V"ll/llLip/ 

Remembering that E^^ 3 x ^ VFi(L^,7r„) is ^-Lipschitz, we get the bound 

(18) P(VKi(L„,^„) >Ep„[W^i(i„,7r„)] + i) <exp-na((l-r)i). 

Thanks to the triangular inequality Wi{Ln,T^n) > Wi{Ln,T:) — Wi(7r„,7r), it 
holds that 

(19) P(l¥i(L„,^) > W^i(7r„,7r) +Ep„[VFi(i„,7r„)] +t) < exp -na ((1 - r)t) . 

This in turn leads to an estimate on the deviations, under the condition that we 
may exhibit a compact set with large measure for all the measures P*. We now 
move on to the proof of Theorem 12.121 

Proof of Theorem [2lM Fix S = t/8. Set m\ = J \x\P'{dx). We have 

■m[ <Tni+ Wi{P\Tr) 
< 2mi. 

Thus 

With a as in the theorem, the above ensures that / e"I^IP*(da;) < 2. 

Let Br denote the ball of center and radius R : we have P^{B'j^) < 2e~°'^. Let 

a at 
so that 26 + y^" 1 -^—k — T < t/2. 
As a{t) = it^ we can compute 



T{Ns,n) = -^J-^/[^s. 
1 — r V n 



We have chosen K — Bn and 5 — t/8. Working as in the proof of Proposition 
12.71 when t < 2/ a, we can bound log J\fs by 

logAfs<Cd{-\og-y 
at at 

where Cd is a numerical constant depending on the dimension d. Plugging the 
above estimates in ([T^ and using the inequality (u — w)^ > it^/2 — gives the 
desired result. 

□ 
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Appendix A. Some facts on transportation inequalities 

A crucial feature of transportation inequalities is that they imply the concentra- 
tion of measure phenomenon, a fact first discovered by Marton The following 
proposition is obtained by a straightforward adaptation of her famous argument : 

Proposition A.l. // fi verifies a a{Td) inequality, then for all measurable sets 
Ac E with fi{A) > i and r > rg = a"^(log2), 

liiA"^) > 1 - e-"('^-''°) 
where A'' = {x e E, d{x, A) <r}. 

Moreover, let X be a r.v. with law fi. For all 1-Lipschitz functions / : i? — !■ R 
and all r > rg, we have 

F(/(^) >mf+r)< e-"('-'-o) 
where ruf denotes a median of f . 

Bobkov and Gotze ([4J) were the first to obtain an equivalent dual formulation of 
transportation inequalities. We present it here in a more general form obtained by 
Gozlan and Leonard (see [IS]), in the case when the transportation cost function 
is the distance. 

Definition A. 2. Let a : [0, +oo) — ^ R &e convex, increasing, left- continuous and 
vanishing at 0. The monotone conjugate of a is 

a®{s) — sup si ~ a{t). 

t>o 

Proposition A. 3 ([15]'). Assume that d is a metric defining the topology of E, and 
that there exist a > 0, Xq G E such that J exp[ad{x, xo)]ii(dx) < +oo. 
Then ii satisfies the ot{Td) inequality 

a{Td{ti.y)) < H{u\ti) 

for all V G V(E) with finite first moment if and only if for all f : E ^ M. 
1-Lipschitz and all X > 0, 

(20) ye^(^(-)-//'^^)/i(d^) <e"®(^). 

In the case Ti(C), Condition (PD|) becomes : for all 1-Lipschitz / : _E — > R and 
A > 0, 

(21) ye^(/-//'^^)A^(x) <e^^'/^ 

A.0.1. Integral criteria. An interesting feature of transportation inequalities is that 
some of them are characterized by simple moment conditions, making it tractable 
to obtain their existence. In |T7], Djellout, Guillin and Wu showed that /x satisfies 
a Ti inequality if and only if J exp[aod'^{xo,y)]^j,{dy) < -foo for some uq and some 
xq ■ They also connect the value of qq and of the Gaussian moment with the value of 
the constant C appearing in the transportation inequality. More generally, Gozlan 
and Leonard provide in |14| a nice criterion to ensure that a a{Td) inequality holds. 
We only quote here one side of what is actually an equivalence : 
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Theorem A. 4. Let ^ e 'P{E). Suppose there exists a > with J e'^'^^^°'^^ fj,{dx) < 2 
for some xq G E, and a convex, increasing function 7 on [0, +00) vanishing at 
and xi € E such that 



One particular instance of the result above was first obtained by BoUey and 
Villani. with sharper constants, in the case when fi only has a finite exponential 
moment ([8J), Corollary 2.6). Their technique involves the study of weighted Pinsker 
inequalities, and encompasses more generally costs of the form d^, p > 1 (we give 
only the case p = 1 here) . 

Theorem A.5. Let a > be such that Eas = J e'"^'-''"'^^ fi{dx) < +00. Then for 
v € Vi{E), we have 



where C* = f (| + logii^a^i) < +00. 

And in the case when /x admits a finite Gaussian moment, the following holds 
([H], Corollary 2.4) : 

Theorem A.6. Let a > be such that Ea.2 = / e'"''(^0'^V(c^a;) < +00. Then fj, 
satisfies a Ti(C) inequality where C = ^ {1 + log Ea.2) < +00. 

Appendix B. Covering numbers of the set of f-LiPSCHiTZ functions 

In this section, we provide bounds for the covering numbers of the set of 1- 
Lipschitz functions over a precompact space. 

Note that these results are likely not new. However, we have been unable to find 
an original work, so we provide proofs for completeness. 

Let (K, d) be a precompact metric space, and let Af{K, S) denote the minimal 
number of balls of radius S necessary to cover K. Let xg € if be a fixed point, and 
let J- denote the set of 1-Lipschitz functions over K vanishing at xq. This is also 
a precompact space when endowed with the metric of uniform convergence. We 
denote by M{J-, 5) the minimal number of balls of radius S necessary to cover J-. 
Finally, we set R = max^x^K d{x,xo). 

Our first estimate is a fairly crude one. 

Proposition B.l. We have 



Proof. For simplicity, write n = Af{K, e). Let xi, . . . ,Xn be the centers of a set of 
balls covering K. For any f £ J- and 1 < i < n, we have 




Then fi satisfies the a{Td) inequality 



a{W^{^i,v)) < H{v\p) 
for all V £ Vi^E) with finite first moment, with 






\f{x,)\ - |/(.T,)-/(a;o)| <i?. 
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For any n-uple of integers k = {ki,. . . ,kn) such that — [-^J — 1 < < [-^J, 
1 < i < n, choose a function fk G such that fcje < fk{xi) < {ki + l)e if there 
exists one. 

Consider j G T. Let li = [ ■^^^'•' J and I = (?!,...,/„). Then the function /; 
defined above exists and \ f{xi) — fi{xi)\ < e for 1 < z < n. But then for any x G K 
there exists i, 1 < i < n, such that x e B{xi, s), and thus 



1/(2:) - Mx)\ < \.f{x) - f{x,)\ + \f{x,) - Mx,)\ + Ifiixi) - Mx)\ < 3e. 

This implies that is covered by the balls of center and radius 3e. As there 
are at most (2 + 2[-^J)" choices for k, this ends the proof. 

□ 

However, this bound is quite weak : as one can see by considering the case of a 
segment, for most choices of a n-uple, there will not exist a function in T satisfying 
the requirements in the proof. With the extra assumption that K is connected, we 
can get a more refined result. 

Proposition B.2. If K is connected, then 



(22) AA(^,s)<(2 + 2Lfj)2^(^'T^\ 

Remark. The simple idea in this proposition is first to bring the problem to a 
discrete metric space (graph), and then to bound the number of Lipschitz functions 
on this graph by the number of Lipschitz functions on a spanning tree of the graph. 

Proof. In the following, we will denote n = M{K, e) for simplicity. Let Xi,l <i <n 
be the centers of a set of n balls Bi, . . . En covering K . Consider the graph G built 
on the n vertices ai, . . . , a„, where vertices ai and aj are connected if and only if 
i ^ j and the balls and Bj have a non-empty intersection. 

Lemma B.3. The graph G is connected. Moreover, there exists a subgraph G' with 
the same set of vertices and whose edges are edges of G, which is a tree. 

Proof. Suppose that G were not connected . Upon exchanging the labels of the 
balls, there would exist A;, 1 < A; < n, such that ior i < k < j the balls Bi and Bj 
have empty intersection. But then K would be equal to the disjoint reunion of the 
sets Uj=i Uj=fe-i-i which are both closed and non-empty, contradicting 

the connectedness of K. 

The second part of the claim is obtained by an easy induction on the size of the 
graph. □ 

Introduce the set A of functions g : {oi, . . . , a„} — >■ M such that g{ai) = and 

\g{ai) — g{aj)\ ~ Ae whenever and aj are connected in G". Using the fact that 
G' is a tree, it is easy to see that A contains at most 2" elements. 

Define a partition of K by setting Ci = Bi, C2 = B2\Ci, . . ., Cn = Bn\Cn-i 
(remark that none of the Ci is empty since the Bi are supposed to constitute a 
minimal covering). Also fix for each i, 1 < i < n, & point yi G Ci (choosing 
yi = x\). Notice that Ci is included in the ball of center yi and radius 2e, and that 
d{yi, yj) < Ae whenever and aj are connected in G (and therefore in G"). 

To every 1-Lipschitz function / : — >• M we associate T{f) : {a\, . . . , a„} — >■ K 
defined by T{f){ai) = f{yi). For any x G K, and /i, /2 € we have the following 
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\fl{^) ~ f2{x)\ < \fl{x)~h{y^)\ + \h{yi)-f2{y^)\ + \f2{y^)-f2{x)\ 

< 4e+||T(/i)-T(/2)|U^(G') 
where i is such that x £ Ci. We now make the foUowing claim : 

Lemma B.4. For every 1-Lipschitz function f : K W such that f{yi) — 0, there 
exists g (z A such that — g\\i^{G') ^ 4£. 

Assume for the moment that this holds. As there are at most 2" functions in 
A, it is possible to choose at most 2" 1-Lipschitz functions fi, . . . , /2ii vanishing 
at xi such that for any 1-Lipschitz function f vanishing at xi there exists fi such 
that |T(/) — T{fi)\ < 8e. Using the inequality above, this implies that the balls 
of center fi and radius 12e for the uniform distance cover the set of 1- Lipschitz 
functions vanishing at xi. 

Finally, consider f E We may write 

f = f-f{xi) + f(.xi) 
and observe that on the one hand, / — f{xi) is a 1-Lipschitz function vanishing 
at xi, and that on the other hand, |/(a;i)| < R- Thus the set is covered by the 
balls of center fi + Ake and radius 16e, where — [||J — 1 < fc < [^J . There are at 
most (2 + 2[^J)2" such balls, which proves the desired result. 

□ 



We now prove Lemma IB. 41 

Proof. Let us use induction again, li K — Bi then T{f) = and the property is 
straightforward. Now \i K = CiU. . .UC„, we may assume without loss of generality 
that ttn is a leaf in G", that is a vertex with exactly one neighbor, and that it is 
connected to a„_i. By hypothesis there exists g : {ai, . . . , a„_i} R such that 
\g{ai) — g{aj)\ = Ae whenever Ui and Oj are connected in G", and \g{ai) — f{ai)\ < 4e 
for 1 < i < n. Set g = g on {ai, . . . , a„_i}, and 

• g(a„) = 5(a„_i) + As if /(?/„) - .g(a„_i) < 0, 

• ff(ari) = ff(an-i) — 4e otherwise. 
Since 

\f{y„) - < \f{yn) - /(yn-i)l + \f{yn-i) - .9K-i)| < 8e 

it is easily checked that \f{yn) — g{o-n)\ < Ae. The function g belongs to A and our 
claim is proved. 

□ 

Appendix C. Holder moments of stochastic processes 

We quote the following result from Revuz and Yor's book [28] (actually the value 
of the constant is not given in their statement but is easily tracked in the proof). 

Theorem C.l. Let Xt, t G [0, 1] he a Banach-valued process such that there exist 
7, e, c > with 

E[|Xt-x,r]<c|^-s|l+^ 

then there exists a modification X of X such that 



E 



s^t \t 
for all < a < e/^. 



\Xt - Xs 

sup ■ 



7n 1/7 



< 2i+"(2c)i/'' ^- r- 
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Corollary C.2. Let {Bt)o<t<i denote the standard Brownian motion on [0, 1]. Let 
rua = Esupj ||-Bt||jj and = ]E(sup( ||Bt||„)^, then rua and Sa are bounded by 

^« ~ ^ "l_2(2a-l)/4ll'^ll4/(l-2a) 

where \\Z\\p denotes the Lp norm of a J\f{0,l) variable Z. 

Proof. Since the increments of the Brownian motion are Gaussian, we have for 
every p> 

E[\Bt - Bsl^P] = Kp\t - s\P 
with Kp = V2Tr~^ Ixp^e-^'/^da;. Choose p such that a<{p- l)/2p, then 

A suitable choice is 1/p = 1/2 — a, and the right-hand side becomes 
C - (2G )(i/2-«)/2 

" ~ 1 - 2("-V2)/2 ^^^"^ 

with = \/2^"V-r k|''/'^"^"'e-^'/2da;. By Holder's inequality, the result 
follows. 

□ 

Corollary C.3. Let Xt be the solution on [0,T] of 

dXt = a{Xt)dBt + b{Xt)dt 
with a, 6 : K — )• M locally Lipschitz and satisfying the following hypotheses : 



• sup^ \^ytra{xY(T{x)\ < A, 

• sup^ |&(a;)| < B. 

Then for a < 1/2, forp such that a < {p—l)/2p, there exists C < +oo depending 
explicitely on A, B, T, a ,p such that 

E\\X\\l < C. 

Proof. We first apply Ito's formula to the function \Xt — Xg]'^ : this yields 
E\Xt - Xsf <2B I E|X„ - Xs\du + A\t - s\. 

J s 

Using the elementary inequality x < 1/2(1 + 2;^), we get 

E\Xt - Xsf <B ( E\Xu - Xs\^du +{A + B)\t - s\. 

J s 

Gronwall's lemma entails 

E\Xt-Xs\^ < (^ + S)e^^|i-s| 
Likewise, applying Ito's formula to \Xt — Xs\'^, we get 
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E\Xt - X^f < AB E\Xu- Xs\^ds + 6A E\Xu-Xs\'^du 

<{6A + 2B)J E\Xu- Xs\^du + 2B J E\Xu-Xsfdu 
1 

< -{6A + 2B){A + B)e^^\t - + 2B E|X„ - X^l'^du 
2 J s 

and by Gronwall's lemma E|Xf - X,|4 < ^{6A + 2B){A + B)e^^^\t- s\^. By an 
easy recurrence, following the above, one may show that 

E\Xt ^ Xsl^P < C{A,B,T,p)\t - s\P. 
To conclude it suffices to call on Theorem lC.il 

□ 



Appendix D. Transportation inequalities for Gaussian measures on a 

Banach space 

Lemma D.l. Let {E, fi) be a Gaussian Banach space, and define m — J \\x\\p{dx). 
Also let denote the weak variance of p. The tail of ji is bounded as follows : for 
all R>0, 

^i{x e E, \\x\\ > m + i?} < e"-^'/^'"'. 



Finally we collect some (loose) results on the Wiener measure on the Banach 
space (C([0,l],M),||.||oo). 

Lemma D.2. The Wiener measure satisfies a T2(8) inequality (and therefore a 
Ti(8) inequality). 

Proof. The Wiener measure satisfies the T2(2(t^) inequality, where 

cr^=supE(/" Bsdp{s)f 

and the supremum runs over all Radon measures on [0, 1] with total variation 
bounded by 1. Note that the weak variance is bounded by the variance defined 
as — E(supf l-Btl)^ (here and hereafter supj \Bt\ refers to the supremum on [0, 1]). 
In turn we can give a (quite crude) bound on s : write supj |i?t| < sup^ Bt — inff i?f , 
thus (supt |Bf 1)2 < (supt Bt - inft Btf < 2(supj Bt)"^ + 2(- inff Bt)"^. Remember 
the well-known fact that supj Bt, — inft Bt and |i?i| have the same law, so that 



E(sup|Bf|)2 < 4E|Bip = 4. 
t 



□ 



Lemma D.3. Let 7 denote the Wiener measure. For a — \/2 log 2/3, we have 

£"11^11-^(^2;) < 2 
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Proof. We have 



e"M^^{dx) = J P(e'^ll^ll~ > t)dt 



-\-oo 
-\-oo 



where r„ is the stopping time inf{t, \Bt\ = u}. It is a simple exercise to compute 
]Ee-AV„/2 ^ i/cosh(Au) < 2e-^". 

This yields 



A — a 

We can choose A = 3a to get / e"ll^ll~7(da;) < e^" In turn it implies the 
desired result. 

□ 
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